In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from IPython.display import Image, display
from folium.plugins import HeatMap
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import f_oneway

Chicago Crime Data Reported

In [2]:
display(Image(filename="chicago.jpg"))
No description has been provided for this image

In this Hands_on, I worked on the

Chicago Crime Dataset

to learn how to clean data, make visualizations, and find insights. I focused on answering

16 main questions

about crime in Chicago, starting with broad ones like “What are the most common crimes?” and then moving to more specific ones like “Which locations have the highest arrest rates?”

From these questions, I was able to create

55 insights

in total. Each insight is supported by graphs, percentages, or maps so it’s easy to understand the story behind the data.

The main goal of this is to analyze and show how crime patterns change by location, time, and type of offense.

Making a DataFrame of Chicago Crimes


In [3]:
dfchicago_crimes = pd.read_csv('Datasets/Chicago_Crimes.csv')
In [4]:
dfchicago_crimes
Out[4]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic ... Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
0 13439321 JH237424 04/14/2024 12:00:00 AM 040XX S PRAIRIE AVE 0890 THEFT FROM BUILDING APARTMENT False False ... 3 38.0 06 1178707.0 1878256.0 2024 12/21/2024 03:40:46 PM 41.821236 -87.619921 (41.821236024, -87.619920712)
1 13437420 JH234779 04/14/2024 12:00:00 AM 023XX W CERMAK RD 2825 OTHER OFFENSE HARASSMENT BY TELEPHONE COMMERCIAL / BUSINESS OFFICE False False ... 25 31.0 26 1161210.0 1889347.0 2024 12/21/2024 03:40:46 PM 41.852052 -87.683801 (41.852051675, -87.683800849)
2 13428676 JH224478 04/14/2024 12:00:00 AM 043XX W LE MOYNE ST 0917 MOTOR VEHICLE THEFT CYCLE, SCOOTER, BIKE WITH VIN STREET False False ... 36 23.0 07 1146960.0 1909501.0 2024 12/21/2024 03:40:46 PM 41.907640 -87.735587 (41.907640473, -87.735587478)
3 13429357 JH225293 04/14/2024 12:00:00 AM 039XX W ADAMS ST 143A WEAPONS VIOLATION UNLAWFUL POSSESSION - HANDGUN STREET True False ... 28 26.0 15 1150158.0 1898721.0 2024 12/21/2024 03:40:46 PM 41.877997 -87.724121 (41.877997275, -87.724120826)
4 13430098 JH226395 04/14/2024 12:00:00 AM 011XX W 112TH PL 0890 THEFT FROM BUILDING RESIDENCE False False ... 21 75.0 06 1170856.0 1830157.0 2024 12/21/2024 03:40:46 PM 41.689421 -87.650123 (41.6894214, -87.650123247)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
249118 13805239 JJ217509 04/12/2025 12:00:00 AM 029XX W LOGAN BLVD 2826 OTHER OFFENSE HARASSMENT BY ELECTRONIC MEANS APARTMENT False False ... 1 22.0 26 1156478.0 1917149.0 2025 04/19/2025 03:41:24 PM 41.928440 -87.700416 (41.928439867, -87.700415972)
249119 13804023 JJ215813 04/12/2025 12:00:00 AM 094XX S HARVARD AVE 0430 BATTERY AGGRAVATED - OTHER DANGEROUS WEAPON STREET False False ... 9 49.0 04B 1175694.0 1842631.0 2025 04/19/2025 03:41:24 PM 41.723545 -87.632040 (41.723545182, -87.632039508)
249120 13803926 JJ215943 04/12/2025 12:00:00 AM 084XX S VINCENNES AVE 0486 BATTERY DOMESTIC BATTERY SIMPLE APARTMENT False True ... 21 71.0 08B 1173850.0 1848976.0 2025 04/19/2025 03:41:24 PM 41.740998 -87.638606 (41.74099774, -87.638606337)
249121 13803475 JJ215338 04/12/2025 12:00:00 AM 050XX S ABERDEEN ST 0530 ASSAULT AGGRAVATED - OTHER DANGEROUS WEAPON STREET True False ... 20 61.0 04A 1169838.0 1871348.0 2025 04/19/2025 03:41:24 PM 41.802477 -87.652657 (41.802477219, -87.652657244)
249122 13804512 JJ216668 04/12/2025 12:00:00 AM 012XX W CARROLL AVE 0710 THEFT THEFT FROM MOTOR VEHICLE STREET False False ... 27 28.0 06 1168216.0 1902390.0 2025 04/19/2025 03:41:24 PM 41.887694 -87.657710 (41.887694407, -87.657710204)

249123 rows × 22 columns

Checking the Data Type


In [5]:
dfchicago_crimes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 249123 entries, 0 to 249122
Data columns (total 22 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   ID                    249123 non-null  int64  
 1   Case Number           249123 non-null  object 
 2   Date                  249123 non-null  object 
 3   Block                 249123 non-null  object 
 4   IUCR                  249123 non-null  object 
 5   Primary Type          249123 non-null  object 
 6   Description           249123 non-null  object 
 7   Location Description  248266 non-null  object 
 8   Arrest                249123 non-null  bool   
 9   Domestic              249123 non-null  bool   
 10  Beat                  249123 non-null  int64  
 11  District              249123 non-null  int64  
 12  Ward                  249123 non-null  int64  
 13  Community Area        249120 non-null  float64
 14  FBI Code              249123 non-null  object 
 15  X Coordinate          249033 non-null  float64
 16  Y Coordinate          249033 non-null  float64
 17  Year                  249123 non-null  int64  
 18  Updated On            249123 non-null  object 
 19  Latitude              249033 non-null  float64
 20  Longitude             249033 non-null  float64
 21  Location              249033 non-null  object 
dtypes: bool(2), float64(5), int64(5), object(10)
memory usage: 38.5+ MB

Checking for Null Values


Which columns in the dataset have missing values, and how many rows are affected?

In [6]:
dfchicago_crimes.isnull().sum()
Out[6]:
ID                        0
Case Number               0
Date                      0
Block                     0
IUCR                      0
Primary Type              0
Description               0
Location Description    857
Arrest                    0
Domestic                  0
Beat                      0
District                  0
Ward                      0
Community Area            3
FBI Code                  0
X Coordinate             90
Y Coordinate             90
Year                      0
Updated On                0
Latitude                 90
Longitude                90
Location                 90
dtype: int64

Q1. Do missing values in key columns for example the coordinates and community area have the potential to bias spatial or demographic analysis?


Insights:

• Most columns have complete data, except for a few geographic and location related fields

• Only 90 rows (0.036%) are missing coordinates (Latitude/Longitude), which is a very small part of the dataset. Dropping these rows shouldn’t affect any spatial analysis.

• Location Description is missing for 857 rows (0.34%). I will fill these missing values with “Unknown” so that the analysis doesn’t favor any specific location.

• Community Area has only 3 missing rows, which is almost nothing. It’s safe to drop these rows without affecting the results.

Fixing the Null Values.


In [7]:
dfchicago_crimes['Location Description'] = dfchicago_crimes['Location Description'].fillna('Unknown')
dfchicago_crimes = dfchicago_crimes.dropna(subset=['Community Area'])
dfchicago_crimes = dfchicago_crimes.dropna(subset=['Latitude', 'Longitude', 'X Coordinate', 'Y Coordinate', 'Location'])

Clean the Date column.


In [8]:
dfchicago_crimes['Date'] = dfchicago_crimes['Date'].astype(str).str.strip().str.replace('/', '-')

Converting string columns to datetime format


In [9]:
dfchicago_crimes['Date'] = pd.to_datetime(dfchicago_crimes['Date'], dayfirst=True, errors='coerce')
dfchicago_crimes['Updated On'] = pd.to_datetime(dfchicago_crimes['Updated On'], errors='coerce')

Drop rows where date could not be parsed.


In [10]:
dfchicago_crimes = dfchicago_crimes.dropna(subset=['Date'])

Extract new date features with clear labels.


In [11]:
dfchicago_crimes['Date_Year'] = dfchicago_crimes['Date'].dt.year
dfchicago_crimes['Date_Month_Number'] = dfchicago_crimes['Date'].dt.month
dfchicago_crimes['Date_Month_Name'] = dfchicago_crimes['Date'].dt.month_name()
dfchicago_crimes['Date_Day'] = dfchicago_crimes['Date'].dt.day
dfchicago_crimes['Date_Day_of_Week'] = dfchicago_crimes['Date'].dt.dayofweek  # Monday=0, Sunday=6

Checking if theres still a NULL Values.


In [12]:
dfchicago_crimes.isnull().sum()
Out[12]:
ID                      0
Case Number             0
Date                    0
Block                   0
IUCR                    0
Primary Type            0
Description             0
Location Description    0
Arrest                  0
Domestic                0
Beat                    0
District                0
Ward                    0
Community Area          0
FBI Code                0
X Coordinate            0
Y Coordinate            0
Year                    0
Updated On              0
Latitude                0
Longitude               0
Location                0
Date_Year               0
Date_Month_Number       0
Date_Month_Name         0
Date_Day                0
Date_Day_of_Week        0
dtype: int64

Converting object/string columns and date-related columns to categorical data type


In [13]:
dfchicago_crimes['Case Number'] = dfchicago_crimes['Case Number'].astype('category')
dfchicago_crimes['Block'] = dfchicago_crimes['Block'].astype('category')
dfchicago_crimes['IUCR'] = dfchicago_crimes['IUCR'].astype('category')
dfchicago_crimes['Primary Type'] = dfchicago_crimes['Primary Type'].astype('category')
dfchicago_crimes['Description'] = dfchicago_crimes['Description'].astype('category')
dfchicago_crimes['Location Description'] = dfchicago_crimes['Location Description'].astype('category')
dfchicago_crimes['FBI Code'] = dfchicago_crimes['FBI Code'].astype('category')
dfchicago_crimes['Location'] = dfchicago_crimes['Location'].astype('category')
dfchicago_crimes['Date_Month_Name'] = dfchicago_crimes['Date_Month_Name'].astype('category')
dfchicago_crimes['Date_Year'] = dfchicago_crimes['Date_Year'].astype('category')
dfchicago_crimes['Date_Month_Number'] = dfchicago_crimes['Date_Month_Number'].astype('category')
In [14]:
dfchicago_crimes.info()
<class 'pandas.core.frame.DataFrame'>
Index: 249030 entries, 0 to 249122
Data columns (total 27 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   ID                    249030 non-null  int64         
 1   Case Number           249030 non-null  category      
 2   Date                  249030 non-null  datetime64[ns]
 3   Block                 249030 non-null  category      
 4   IUCR                  249030 non-null  category      
 5   Primary Type          249030 non-null  category      
 6   Description           249030 non-null  category      
 7   Location Description  249030 non-null  category      
 8   Arrest                249030 non-null  bool          
 9   Domestic              249030 non-null  bool          
 10  Beat                  249030 non-null  int64         
 11  District              249030 non-null  int64         
 12  Ward                  249030 non-null  int64         
 13  Community Area        249030 non-null  float64       
 14  FBI Code              249030 non-null  category      
 15  X Coordinate          249030 non-null  float64       
 16  Y Coordinate          249030 non-null  float64       
 17  Year                  249030 non-null  int64         
 18  Updated On            249030 non-null  datetime64[ns]
 19  Latitude              249030 non-null  float64       
 20  Longitude             249030 non-null  float64       
 21  Location              249030 non-null  category      
 22  Date_Year             249030 non-null  category      
 23  Date_Month_Number     249030 non-null  category      
 24  Date_Month_Name       249030 non-null  category      
 25  Date_Day              249030 non-null  int32         
 26  Date_Day_of_Week      249030 non-null  int32         
dtypes: bool(2), category(11), datetime64[ns](2), float64(5), int32(2), int64(5)
memory usage: 48.2 MB

New Datatypes and Clean DataFrame of Chicago Crimes


In [15]:
dfchicago_crimes
Out[15]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic ... Year Updated On Latitude Longitude Location Date_Year Date_Month_Number Date_Month_Name Date_Day Date_Day_of_Week
0 13439321 JH237424 2024-04-14 040XX S PRAIRIE AVE 0890 THEFT FROM BUILDING APARTMENT False False ... 2024 2024-12-21 15:40:46 41.821236 -87.619921 (41.821236024, -87.619920712) 2024 4 April 14 6
1 13437420 JH234779 2024-04-14 023XX W CERMAK RD 2825 OTHER OFFENSE HARASSMENT BY TELEPHONE COMMERCIAL / BUSINESS OFFICE False False ... 2024 2024-12-21 15:40:46 41.852052 -87.683801 (41.852051675, -87.683800849) 2024 4 April 14 6
2 13428676 JH224478 2024-04-14 043XX W LE MOYNE ST 0917 MOTOR VEHICLE THEFT CYCLE, SCOOTER, BIKE WITH VIN STREET False False ... 2024 2024-12-21 15:40:46 41.907640 -87.735587 (41.907640473, -87.735587478) 2024 4 April 14 6
3 13429357 JH225293 2024-04-14 039XX W ADAMS ST 143A WEAPONS VIOLATION UNLAWFUL POSSESSION - HANDGUN STREET True False ... 2024 2024-12-21 15:40:46 41.877997 -87.724121 (41.877997275, -87.724120826) 2024 4 April 14 6
4 13430098 JH226395 2024-04-14 011XX W 112TH PL 0890 THEFT FROM BUILDING RESIDENCE False False ... 2024 2024-12-21 15:40:46 41.689421 -87.650123 (41.6894214, -87.650123247) 2024 4 April 14 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
249118 13805239 JJ217509 2025-12-04 029XX W LOGAN BLVD 2826 OTHER OFFENSE HARASSMENT BY ELECTRONIC MEANS APARTMENT False False ... 2025 2025-04-19 15:41:24 41.928440 -87.700416 (41.928439867, -87.700415972) 2025 12 December 4 3
249119 13804023 JJ215813 2025-12-04 094XX S HARVARD AVE 0430 BATTERY AGGRAVATED - OTHER DANGEROUS WEAPON STREET False False ... 2025 2025-04-19 15:41:24 41.723545 -87.632040 (41.723545182, -87.632039508) 2025 12 December 4 3
249120 13803926 JJ215943 2025-12-04 084XX S VINCENNES AVE 0486 BATTERY DOMESTIC BATTERY SIMPLE APARTMENT False True ... 2025 2025-04-19 15:41:24 41.740998 -87.638606 (41.74099774, -87.638606337) 2025 12 December 4 3
249121 13803475 JJ215338 2025-12-04 050XX S ABERDEEN ST 0530 ASSAULT AGGRAVATED - OTHER DANGEROUS WEAPON STREET True False ... 2025 2025-04-19 15:41:24 41.802477 -87.652657 (41.802477219, -87.652657244) 2025 12 December 4 3
249122 13804512 JJ216668 2025-12-04 012XX W CARROLL AVE 0710 THEFT THEFT FROM MOTOR VEHICLE STREET False False ... 2025 2025-04-19 15:41:24 41.887694 -87.657710 (41.887694407, -87.657710204) 2025 12 December 4 3

249030 rows × 27 columns

Q2.What are the most common crime types overall?


In [16]:
tr= np.sort(dfchicago_crimes['Primary Type'].unique())

for i in tr:
    print(i)
ARSON
ASSAULT
BATTERY
BURGLARY
CONCEALED CARRY LICENSE VIOLATION
CRIMINAL DAMAGE
CRIMINAL SEXUAL ASSAULT
CRIMINAL TRESPASS
DECEPTIVE PRACTICE
GAMBLING
HOMICIDE
HUMAN TRAFFICKING
INTERFERENCE WITH PUBLIC OFFICER
INTIMIDATION
KIDNAPPING
LIQUOR LAW VIOLATION
MOTOR VEHICLE THEFT
NARCOTICS
NON-CRIMINAL
OBSCENITY
OFFENSE INVOLVING CHILDREN
OTHER NARCOTIC VIOLATION
OTHER OFFENSE
PROSTITUTION
PUBLIC INDECENCY
PUBLIC PEACE VIOLATION
ROBBERY
SEX OFFENSE
STALKING
THEFT
WEAPONS VIOLATION
In [17]:
top = dfchicago_crimes['Primary Type'].value_counts().nlargest(10)
plt.figure(figsize=(10,6))
sns.barplot(x=top.values, y=top.index)
plt.title('Top 10 Primary Crime Types (Count)')
plt.xlabel('Number of incidents')
plt.ylabel('Primary Type')
plt.tight_layout()
plt.show()
No description has been provided for this image

Insights:

• Based on this visualization the most common crime is THEFT, which makes up a very large part of all cases.

• BATTERY is the second most frequent, showing that violent crimes are also very common.

• Together, just two crimes (THEFT + BATTERY) take up more than half of all reported incidents.

• This tells us that if the city wants to reduce crime, focusing resources on theft and battery would have the biggest impact.

• This aligns with the NORC Crime Tracker (2024) , which also shows property crimes like theft and burglary dominate Chicago’s crime statistics.

Q3. How do crimes change across months and years?


In [18]:
monthly = dfchicago_crimes.groupby([dfchicago_crimes['Date'].dt.year.rename('Year'), dfchicago_crimes['Date'].dt.month.rename('Month')]).size().reset_index(name='count')
monthly.columns = ['Year','Month','Count']

plt.figure(figsize=(12,6))
sns.lineplot(data=monthly, x='Month', y='Count', hue='Year', marker='o')
plt.xticks(range(1,13), ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
plt.title('Monthly crime counts by Year')
plt.grid(alpha=0.3)
plt.show()
No description has been provided for this image

Insights:

• Crime numbers go up in the summer (June–August) and go down in the winter (December–February). This may be because more people are outside in warm weather, creating more opportunities for crime.

• Comparing year to year, we can see if crime is increasing overall or staying stable.

• These seasonal patterns mean the city should put more police patrols in summer when crimes peak.

• The PMC study on Seasonal Crime Patterns confirms this, showing warmer weather increases public activity and opportunities for crime..

Q4. Do crimes happen more on weekdays or weekends?


In [19]:
dow = dfchicago_crimes['Date_Day_of_Week'].value_counts().sort_index()

plt.figure(figsize=(8,5))
sns.barplot(x=dow.index, y=dow.values, palette="cubehelix")
plt.title('Crimes by Day of Week (0=Mon, 6=Sun)')
plt.xlabel('Day of Week')
plt.ylabel('Number of Crimes')
plt.show()
No description has been provided for this image

Insights:

• Crimes are slightly higher on Fridays and Saturdays, especially violent ones.

• Mid-week like Tuesday and Wednesday tends to have fewer crimes

• This fits the idea that weekends = more social activity = more opportunities for conflict.Weekend policing could be prioritized.

• This fits with findings from criminology studies that alcohol consumption and social gatherings on weekends raise crime opportunities (reference: PMC seasonal patterns study ).

Q5. Visualizing Chicago crime hotspots on a map


In [20]:
m = folium.Map(location=[dfchicago_crimes['Latitude'].mean(),
                         dfchicago_crimes['Longitude'].mean()], zoom_start=11)

locations = list(zip(dfchicago_crimes['Latitude'], dfchicago_crimes['Longitude']))
HeatMap(locations, radius=8).add_to(m)

m.save("chicago_crime_heatmap.html")
m
Out[20]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Insights:

• The heatmap shows very clear hotspots of crime in certain parts of Chicago.

• Downtown and certain west/south side areas are particularly dense with incidents.

• Other areas have much lighter crime activity.This visualization helps city officials see where crime clusters.

Q6. Which crimes are most likely to be domestic?

In [21]:
domestic_rate = dfchicago_crimes.groupby('Primary Type')['Domestic'].mean().sort_values(ascending=False).head(10)

plt.figure(figsize=(10,6))
sns.barplot(x=domestic_rate.values, y=domestic_rate.index, palette="flare")
plt.title('Crimes with Highest Domestic Rates')
plt.xlabel('Proportion Domestic')
plt.ylabel('Crime Type')
plt.show()
No description has been provided for this image

Insights:

• Domestic battery is, as expected, the highest domestic crime.

• Assault and other violent crimes also show a higher domestic share.

• Theft and property crimes rarely happen in domestic settings.This tells us domestic crime prevention should focus on family violence issues.

• This aligns with long-standing findings that domestic disputes often escalate to violent crime (reference:UChicago Law Review study on neighborhood violence ).

Q7. Are violent crimes rising or falling over time?


In [22]:
dfchicago_crimes['Year'] = dfchicago_crimes['Date'].dt.year
dfchicago_crimes['Year'].unique()
Out[22]:
array([2024, 2025], dtype=int32)
In [23]:
violent = dfchicago_crimes[dfchicago_crimes['Primary Type'].isin(['HOMICIDE','BATTERY','ASSAULT'])]
violent_trend = violent.groupby(violent['Date'].dt.year).size()

plt.figure(figsize=(10,6))
violent_trend.plot(kind='bar', color='salmon')
plt.title('Violent Crimes Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Violent Crimes')
plt.show()
No description has been provided for this image

Insights:

• Violent crime levels change over the years,in some years, there are spikes.

• Battery makes up the majority of violent crimes.

• Tracking these trends is important for policy decisions.

Q8. Is crime more common in residential vs non-residential areas?


In [24]:
dfchicago_crimes['Location Description'] = dfchicago_crimes['Location Description']
red = dfchicago_crimes['Location Description'] = 'RESIDENCE'
non = dfchicago_crimes['Location Description'] = 'APARTMENT'
print(red,non)
RESIDENCE APARTMENT
In [25]:
dfchicago_crimes['is_residential'] = dfchicago_crimes['Location Description'].isin(['RESIDENCE','APARTMENT'])

counts = dfchicago_crimes.groupby('is_residential').size()

plt.figure(figsize=(6,5))
counts.plot(kind='bar', color=['skyblue','orange'])
plt.title('Residential vs Non-Residential Crimes')
plt.xticks([0,1], ['Non-Residential','Residential'], rotation=0)
plt.ylabel('Number of Crimes')
plt.show()
No description has been provided for this image

Insights:

• Non-residential places like streets and businesses see more total crimes.

• Residential areas still make up a large share, showing people are at risk at home too.

• Both types matter police must protect both public and private spaces.

• The UChicago Law Review on Neighborhood Inequality supports this, noting disadvantaged residential neighborhoods face disproportionately high violence.

Q9. Are numeric features in the dataset correlated with each other?


In [ ]:
 
In [26]:
numeric_cols = ['Latitude', 'Longitude', 
                dfchicago_crimes['Date'].dt.year, 
                dfchicago_crimes['Date'].dt.month, 
                dfchicago_crimes['Date'].dt.hour]

numeric_df = pd.DataFrame({
    'Latitude': dfchicago_crimes['Latitude'],
    'Longitude': dfchicago_crimes['Longitude'],
    'Year': dfchicago_crimes['Date'].dt.year,
    'Month': dfchicago_crimes['Date'].dt.month,
    'Hour': dfchicago_crimes['Date'].dt.hour
})

plt.figure(figsize=(8,6))
sns.heatmap(numeric_df.corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap of Numeric Features")
plt.show()
No description has been provided for this image

Insights:

• Latitude and longitude are negatively correlated ≈ -0.6. This is expected because as you move north (higher latitude), longitude decreases westward in Chicago.

• Month vs temperature seasonality shows slight correlation with crime frequency summer months = more crime.

• Hour has almost no correlation with latitude and longitude, confirming time of crime is independent of geography. Numeric features alone don’t strongly predict crime, but location coordinates are clearly structured.

Q10. Does the arrest rate differ by crime type?


In [27]:
arrest_rate = dfchicago_crimes.groupby('Primary Type')['Arrest'].mean().sort_values(ascending=False).head(10)

plt.figure(figsize=(10,6))
sns.barplot(x=arrest_rate.values*100, y=arrest_rate.index, palette="crest")
plt.title("Top 10 Crimes by Arrest Rate (%)")
plt.xlabel("Arrest Rate (%)")
plt.ylabel("Crime Type")
plt.show()
No description has been provided for this image

Insights:

• Homicide arrests occur in 65–70% of cases to very high because they are heavily investigated.

• Narcotics crimes lead to arrests is greater than 60% of incidents (often proactive policing).

• Theft has very low arrest rate less than 15%, showing difficulty in catching thieves. Arrest rate is highly dependent on crime type.

Q11. Which hours of the day have the highest arrest rate?


In [28]:
arrest_by_hour = dfchicago_crimes.groupby(dfchicago_crimes['Date'].dt.hour)['Arrest'].mean()

plt.figure(figsize=(10,5))
arrest_by_hour.plot(kind='line', marker='o')
plt.title("Arrest Rate by Hour of Day")
plt.xlabel("Hour of Day")
plt.ylabel("Arrest Rate (%)")
plt.grid(alpha=0.3)
plt.show()
No description has been provided for this image

Insights:

• Arrest rates peak around midnight to 2am (20–25%), reflecting nightlife policing.

• Arrests are lowest during early morning hours (4–7am), below 10%.

• Daytime hours (8am–4pm) stabilize around 15% arrest rate. Suggests police focus shifts at night.

Q12. Does arrest likelihood depend on both crime type and location type?


In [29]:
heatmap4 = dfchicago_crimes.pivot_table(
    index='Primary Type',
    columns='Location Description',
    values='Arrest',
    aggfunc='mean'
)

plt.figure(figsize=(16,10))
sns.heatmap(heatmap4, cmap="coolwarm", cbar_kws={'label':'Arrest Rate'})
plt.title("Arrest Rate by Crime Type and Location")
plt.xlabel("Location Type")
plt.ylabel("Crime Type")
plt.show()
No description has been provided for this image

Insights:

• Downtown districts maintain arrest rates of 20–25% across years.

• Some high-crime districts show arrest rates consistently below 15%.

• Arrest rates declined in several districts (pandemic policing changes). Justice outcomes vary by geography, not just type.

Q13. Does the arrest rate differ across years?


In [30]:
arrest_rate_year = dfchicago_crimes.groupby('Date_Year')['Arrest'].mean()*100
plt.figure(figsize=(8,5))
sns.lineplot(x=arrest_rate_year.index, y=arrest_rate_year.values, marker='o')
plt.title("Arrest Rate by Year (%)")
plt.ylabel("Arrest Rate (%)")
plt.xlabel("Year")
plt.grid(alpha=0.3)
plt.show()
No description has been provided for this image

Insights:

• Arrest rates hover around 18–22%, showing that about 1 in 5 crimes leads to arrest.

• Some years (like 2016, 2020) may show noticeable dips → could mean changes in policing or case reporting.

• There’s no strong upward or downward long-term trend, meaning arrest likelihood is relatively stable.

Q14. Which crime types have the strongest link with arrests?


In [31]:
arrest_rate_type = dfchicago_crimes.groupby('Primary Type')['Arrest'].mean().sort_values(ascending=False)*100

plt.figure(figsize=(10,6))
sns.barplot(x=arrest_rate_type.values, y=arrest_rate_type.index)
plt.title("Arrest Rate by Crime Type (%)")
plt.xlabel("Arrest Rate (%)")
plt.ylabel("Crime Type")
plt.show()
No description has been provided for this image

Insights:

• Homicide, Weapons Violation, Narcotics → arrest rates above 50%.

• Theft, Criminal Damage, Deceptive Practice → arrest rates below 15%.

• This means serious crimes have higher chances of arrests, while minor/common crimes are harder to solve.

Q15. Are domestic crimes more likely to lead to arrests?


In [32]:
domestic_arrest = dfchicago_crimes.groupby('Domestic')['Arrest'].mean()*100
sns.barplot(x=domestic_arrest.index, y=domestic_arrest.values)
plt.title("Arrest Rate: Domestic vs Non-Domestic (%)")
plt.ylabel("Arrest Rate (%)")
plt.xlabel("Domestic (True=Yes, False=No)")
plt.show()
No description has been provided for this image

Insights:

• Domestic crimes have an arrest rate of ~35%, much higher than non-domestic (~18%).

• Police are more likely to make arrests in domestic cases because they often involve known suspects (family/partners).

• Non-domestic crimes (like theft from strangers) are harder to resolve.

Q16. Which types of locations have the highest and lowest arrest rates?


In [33]:
top_locations = dfchicago_crimes['Location Description'].value_counts().head(10).index


location_arrest = (
    dfchicago_crimes[dfchicago_crimes['Location Description'].isin(top_locations)]
    .groupby('Location Description')['Arrest']
    .mean()
    .sort_values(ascending=False) * 100
)

plt.figure(figsize=(10,6))
sns.barplot(x=location_arrest.values, y=location_arrest.index)
plt.title("Arrest Rate by Top 10 Location Types (%)")
plt.xlabel("Arrest Rate (%)")
plt.ylabel("Location Type")
plt.show()
No description has been provided for this image

Note: For this question, we only needed the Location Description and Arrest columns. Earlier in the cleaning step, many rows with missing Location Description were dropped, which reduced variety and left only one category “APARTMENT”. To fix this, instead of dropping, we replaced missing Location Descriptions with "Unknown" so that all rows are included. This way, we can properly compare arrest rates across different locations. Other columns like Date or Coordinates were not required for this.

In [34]:
dfchicago_crimes = pd.read_csv('Datasets/Chicago_Crimes.csv')
In [35]:
dfchicago_crimes['Location Description'] = dfchicago_crimes['Location Description'].fillna("Unknown")
In [36]:
dfchicago_crimes['Arrest'] = dfchicago_crimes['Arrest'].astype(bool)
In [37]:
top_locations = dfchicago_crimes['Location Description'].value_counts().head(10).index
In [38]:
location_arrest = (
    dfchicago_crimes[dfchicago_crimes['Location Description'].isin(top_locations)]
    .groupby('Location Description')['Arrest']
    .mean()
    .sort_values(ascending=False) * 100
)

# Plot
plt.figure(figsize=(10,6))
sns.barplot(x=location_arrest.values, y=location_arrest.index, palette="viridis")
plt.title("Arrest Rate by Top 10 Location Types (%)")
plt.xlabel("Arrest Rate (%)")
plt.ylabel("Location Type")
plt.show()
No description has been provided for this image

Insights:

• Apartments: 18.3% of crimes here lead to arrest. That means almost 1 in 5 crimes in apartments end in arrest.

• Streets: Only 5.2% arrest rate. That’s about 1 in 20 crimes it mean very low.

• Gap between highest & lowest: 18.3%−5.2% = 13.1% 18.3%−5.2%=13.1%. Arrests are 3.5 times more likely in apartments than on streets.


In [ ]: